267 research outputs found
3D Reconstruction with Low Resolution, Small Baseline and High Radial Distortion Stereo Images
In this paper we analyze and compare approaches for 3D reconstruction from
low-resolution (250x250), high radial distortion stereo images, which are
acquired with small baseline (approximately 1mm). These images are acquired
with the system NanEye Stereo manufactured by CMOSIS/AWAIBA. These stereo
cameras have also small apertures, which means that high levels of illumination
are required. The goal was to develop an approach yielding accurate
reconstructions, with a low computational cost, i.e., avoiding non-linear
numerical optimization algorithms. In particular we focused on the analysis and
comparison of radial distortion models. To perform the analysis and comparison,
we defined a baseline method based on available software and methods, such as
the Bouguet toolbox [2] or the Computer Vision Toolbox from Matlab. The
approaches tested were based on the use of the polynomial model of radial
distortion, and on the application of the division model. The issue of the
center of distortion was also addressed within the framework of the application
of the division model. We concluded that the division model with a single
radial distortion parameter has limitations
A Non-Rigid Map Fusion-Based RGB-Depth SLAM Method for Endoscopic Capsule Robots
In the gastrointestinal (GI) tract endoscopy field, ingestible wireless
capsule endoscopy is considered as a minimally invasive novel diagnostic
technology to inspect the entire GI tract and to diagnose various diseases and
pathologies. Since the development of this technology, medical device companies
and many groups have made significant progress to turn such passive capsule
endoscopes into robotic active capsule endoscopes to achieve almost all
functions of current active flexible endoscopes. However, the use of robotic
capsule endoscopy still has some challenges. One such challenge is the precise
localization of such active devices in 3D world, which is essential for a
precise three-dimensional (3D) mapping of the inner organ. A reliable 3D map of
the explored inner organ could assist the doctors to make more intuitive and
correct diagnosis. In this paper, we propose to our knowledge for the first
time in literature a visual simultaneous localization and mapping (SLAM) method
specifically developed for endoscopic capsule robots. The proposed RGB-Depth
SLAM method is capable of capturing comprehensive dense globally consistent
surfel-based maps of the inner organs explored by an endoscopic capsule robot
in real time. This is achieved by using dense frame-to-model camera tracking
and windowed surfelbased fusion coupled with frequent model refinement through
non-rigid surface deformations
MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving
3D object detection is a significant task for autonomous driving. Recently
with the progress of vision transformers, the 2D object detection problem is
being treated with the set-to-set loss. Inspired by these approaches on 2D
object detection and an approach for multi-view 3D object detection DETR3D, we
propose MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer architecture to
fuse image and LiDAR features to improve the detection accuracy. Our end-to-end
single-stage, anchor-free and NMS-free network takes in multi-view images and
LiDAR point clouds and predicts 3D bounding boxes. Firstly, we link the object
queries learnt from data to the image and LiDAR features using a novel
MSF3DDETR cross-attention block. Secondly, the object queries interacts with
each other in multi-head self-attention block. Finally, MSF3DDETR block is
repeated for number of times to refine the object queries. The MSF3DDETR
network is trained end-to-end on the nuScenes dataset using Hungarian algorithm
based bipartite matching and set-to-set loss inspired by DETR. We present both
quantitative and qualitative results which are competitive to the
state-of-the-art approaches.Comment: Accepted at the ICPR 2022 Workshop DLVDR202
Li3DeTr: A LiDAR based 3D Detection Transformer
Inspired by recent advances in vision transformers for object detection, we
propose Li3DeTr, an end-to-end LiDAR based 3D Detection Transformer for
autonomous driving, that inputs LiDAR point clouds and regresses 3D bounding
boxes. The LiDAR local and global features are encoded using sparse convolution
and multi-scale deformable attention respectively. In the decoder head,
firstly, in the novel Li3DeTr cross-attention block, we link the LiDAR global
features to 3D predictions leveraging the sparse set of object queries learnt
from the data. Secondly, the object query interactions are formulated using
multi-head self-attention. Finally, the decoder layer is repeated
number of times to refine the object queries. Inspired by DETR, we employ
set-to-set loss to train the Li3DeTr network. Without bells and whistles, the
Li3DeTr network achieves 61.3% mAP and 67.6% NDS surpassing the
state-of-the-art methods with non-maximum suppression (NMS) on the nuScenes
dataset and it also achieves competitive performance on the KITTI dataset. We
also employ knowledge distillation (KD) using a teacher and student model that
slightly improves the performance of our network.Comment: Accepted at the IEEE/CVF Winter Conference on Applications of
Computer Vision (WACV) 202
Real-time human activity monitoring exploring multiple vision sensors
In this paper, we describe the monitoring of human activity in an indoor environment through the use of multiple vision sensors. The system described in this paper is made up of three cameras. Two of these cameras are active and are part of a binocular system. They operate either as a set of three static cameras or as a set of one fixed camera and an active binocular vision system. The human activity is monitored by extracting several parameters that are useful for their classification. The system enables the creation of a record based on the type of activity. These logs can be selectively accessed and provide images of the humans in specific areas.http://www.sciencedirect.com/science/article/B6V16-4379G7B-B/1/216c3ddb70ad57a4491b08864d1b296
Magnetic-Visual Sensor Fusion-based Dense 3D Reconstruction and Localization for Endoscopic Capsule Robots
Reliable and real-time 3D reconstruction and localization functionality is a
crucial prerequisite for the navigation of actively controlled capsule
endoscopic robots as an emerging, minimally invasive diagnostic and therapeutic
technology for use in the gastrointestinal (GI) tract. In this study, we
propose a fully dense, non-rigidly deformable, strictly real-time,
intraoperative map fusion approach for actively controlled endoscopic capsule
robot applications which combines magnetic and vision-based localization, with
non-rigid deformations based frame-to-model map fusion. The performance of the
proposed method is demonstrated using four different ex-vivo porcine stomach
models. Across different trajectories of varying speed and complexity, and four
different endoscopic cameras, the root mean square surface reconstruction
errors 1.58 to 2.17 cm.Comment: submitted to IROS 201
- …